-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(Vercel) Build cache improvements #15317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Bundle ReportChanges will decrease total bundle size by 460.9kB (-1.96%) ⬇️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: sentry-docs-client-array-pushAssets Changed:
view changes for bundle: sentry-docs-server-cjsAssets Changed:
Files in
App Routes Affected:
|
- Use VERCEL_GIT_COMMIT_REF (branch name) in cache keys for cross-commit persistence - Include registry data hash in cache key to detect registry updates - Enable caching for 200+ platform-include files (previously skipped) - Add build timing instrumentation - Expected: 18 min → 2-3 min on first build, ~2 min on subsequent commits
| const leanHTML = rawHTML | ||
| // Remove all script tags (build IDs, chunk hashes, Vercel injections) | ||
| .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
<script
| // Remove elements that change between builds but don't affect markdown output | ||
| const leanHTML = rawHTML | ||
| // Remove all script tags (build IDs, chunk hashes, Vercel injections) | ||
| .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') |
Check failure
Code scanning / CodeQL
Bad HTML filtering regexp High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 days ago
The best way to fix this problem is to use a proper HTML parser to remove unwanted tags (such as <script>, <link>, and <meta>), rather than relying on regular expressions. This provides more robust handling of HTML's intricacies, such as extra whitespace, unusual attribute formatting, and invalid but tolerated browser syntax. Since the script already imports rehype-parse (for parsing HTML to a syntax tree) and other tools from the unified/rehype ecosystem, the fix can use these existing libraries.
Specifically, instead of using .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') (and similar regex for <link> and <meta>), we should parse the HTML into an AST, programmatically remove the unwanted nodes, and then serialize the AST back to HTML for further processing. This fix should be applied within the genMDFromHTML function, replacing the leanHTML construction (lines 233–242) with parser-based routines.
No new dependencies are needed since rehype-parse, unist-util-remove, and related packages are already imported. We'll need to use unified().use(rehypeParse, {fragment: true}) to parse the HTML, use remove(tree, test) from unist-util-remove to strip undesired nodes, and a rehype serializer (e.g., rehype-stringify) to convert the AST back to HTML. If not already available, we should add a rehype-stringify import.
-
Copy modified line R29 -
Copy modified lines R234-R237 -
Copy modified lines R239-R271
| @@ -26,6 +26,7 @@ | ||
| import remarkStringify from 'remark-stringify'; | ||
| import {unified} from 'unified'; | ||
| import {remove} from 'unist-util-remove'; | ||
| import rehypeStringify from 'rehype-stringify'; | ||
|
|
||
| const DOCS_ORIGIN = 'https://docs.sentry.io'; | ||
| const CACHE_VERSION = 3; | ||
| @@ -230,17 +231,44 @@ | ||
|
|
||
| // Normalize HTML to make cache keys deterministic across builds | ||
| // Remove elements that change between builds but don't affect markdown output | ||
| const leanHTML = rawHTML | ||
| // Remove all script tags (build IDs, chunk hashes, Vercel injections) | ||
| .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') | ||
| // Remove link tags for stylesheets and preloads (chunk hashes change) | ||
| .replace(/<link[^>]*>/gi, '') | ||
| // Remove meta tags that might have build-specific content | ||
| .replace(/<meta name="next-size-adjust"[^>]*>/gi, '') | ||
| // Remove data attributes that Next.js/Vercel add (build IDs, etc.) | ||
| .replace(/\s+data-next-[a-z-]+="[^"]*"/gi, '') | ||
| .replace(/\s+data-nextjs-[a-z-]+="[^"]*"/gi, ''); | ||
| // Remove all <script>, <link>, and next-size-adjust <meta> tags, as well as data-* attributes, using an HTML parser. | ||
| const parsedHtmlTree = unified() | ||
| .use(rehypeParse, {fragment: true}) | ||
| .parse(rawHTML); | ||
|
|
||
| // Remove unwanted elements using unist-util-remove | ||
| // Remove <script> tags | ||
| remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'script'); | ||
| // Remove <link> tags | ||
| remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'link'); | ||
| // Remove <meta name="next-size-adjust" ...> | ||
| remove(parsedHtmlTree, (node) => | ||
| node.type === 'element' && | ||
| node.tagName === 'meta' && | ||
| node.properties && | ||
| node.properties.name === 'next-size-adjust' | ||
| ); | ||
| // Remove data-next-* and data-nextjs-* attributes from all elements | ||
| function cleanseDataAttrs(node) { | ||
| if (node && node.type === 'element' && node.properties) { | ||
| Object.keys(node.properties).forEach((key) => { | ||
| if (/^data-next(-|js-)/.test(key)) { | ||
| delete node.properties[key]; | ||
| } | ||
| }); | ||
| } | ||
| if (node.children) { | ||
| node.children.forEach(cleanseDataAttrs); | ||
| } | ||
| } | ||
| cleanseDataAttrs(parsedHtmlTree); | ||
|
|
||
| // Convert AST back to HTML | ||
| const leanHTML = unified() | ||
| .use(() => (tree) => tree) // identity plugin since tree already processed | ||
| .use(rehypeStringify) | ||
| .stringify(parsedHtmlTree); | ||
|
|
||
| if (shouldDebug) { | ||
| console.log( | ||
| `✂️ Lean HTML length: ${leanHTML.length} chars (removed ${rawHTML.length - leanHTML.length} chars)` |
DESCRIBE YOUR PR
IS YOUR CHANGE URGENT?
Help us prioritize incoming PRs by letting us know when the change needs to go live.
SLA
Thanks in advance for your help!
PRE-MERGE CHECKLIST
Make sure you've checked the following before merging your changes:
LEGAL BOILERPLATE
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
EXTRA RESOURCES